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We analyze how an observer synchronizes to the internal state of a finite-state information source, 
using the e-machine causal representation. Here, we treat the case of exact synchronization, when 
it is possible for the observer to synchronize completely after a finite number of observations. The 
more difficult case of strictly asymptotic synchronization is treated in a sequel. In both cases, we 
find that an observer, on average, will synchronize to the source state exponentially fast and that, 
as a result, the average accuracy in an observer's predictions of the source output approaches its 
optimal level exponentially fast as well. Additionally, we show here how to analytically calculate 
the synchronization rate for exact e-machines and provide an efficient polynomial-time algorithm to 
test e-machines for exactness. 
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I. INTRODUCTION 



A. Stationary Information Sources 



Synchronization and state estimation for finite-state 
sources is a central interest in several disciplines, includ- 
ing information theory, theoretical computer science, and 
dynamical systems [T]-[5]. Here, we study the synchro- 
nization problem for a class of finite-state hidden Markov 
models known as e-machines [B]. These machines have 
the important property of unifilarity, meaning that the 
next state is completely determined by the current state 
and the next output symbol generated. Thus, if an ob- 
server is ever able to synchronize to the machine's inter- 
nal state, it remains synchronized forever using contin- 
ued observations of the output. As we will see, our syn- 
chronization results also have important consequences for 
prediction. The future output of an e-machine is a func- 
tion of the current state, so better knowledge of its state 
enables an observer to make better predictions of the 
output. 



II. BACKGROUND 

This section provides the necessary background for our 
results, including information-theoretic measures of pre- 
diction for stationary information sources and formal def- 
initions of e-machines and synchronization. In particular, 
we identify two qualitatively distinct types of synchro- 
nization: exact (synchronization via finite observation 
sequences) and asymptotic (requiring infinite sequences) . 
The exact case is the subject here; the nonexact case is 
treated in a sequel [7j. 



Let A be a finite alphabet, and let Xq, Xi, ... be the 
random variables (RVs) for a sequence of observed sym- 
bols Xt £ A generated by an information source. We 
denote the RVs for the sequence of future symbols be- 
ginning at time t = as ~X = XqXiX2---, the block of L 
symbols beginning at time t = as x" L = XqXi...Xj j _i, 
and the block of L symbols beginning at a given time t 

as Xf — X t X t +i...X t -\-L-i- A stationary source is one 
for which Pr(^f-) = Pr(A^) for all t and all L > 0. 

We monitor an observer's predictions of a stationary 
source using information-theoretic measures [5], as re- 
viewed below. 

Definition 1. The block entropy H(L) for a stationary 
source is: 

H(L) = H[1 L ] = - Pr(^ L )log 2 Pr(^ L ) . 

The block entropy gives the average uncertainty in ob- 
serving blocks X L . 

Definition 2. The entropy rate /i M is the asymptotic 
average entropy per symbol: 

L — y oc J_j 



lim H[X L \]l L } . 



L— >oo 



Definition 3. The entropy rate's length- L approxima- 
tion is: 

h^(L)=H(L)-H(L-l) 
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That is, h^{L) is the observer's average uncertainty in 
the next symbol to be generated after observing the first 
L — 1 symbols. 

For any stationary process, h^(L) monotonically de- 
creases to the limit [S]. However, the form of con- 
vergence depends on the process. The lower the value 
of a source has, the better an observer's predictions 
of the source output will be asymptotically. The faster 
h^(L) converges to h^, the faster the observer's predic- 
tions reach this optimal asymptotic level. If we are in- 
terested in making predictions after a finite observation 
sequence, then the source's true entropy rate as well 
as the rate of convergence of h^{L) to h^, are both im- 
portant properties of an information source. 

B. Hidden Markov Models 

In what follows we restrict our attention to an im- 
portant class of stationary information sources known as 
hidden Markov models. For simplicity, we assume the 
number of states is finite. 

Definition 4. A finite-state edge-label hidden Markov 
machine (HMM) consists of 

1. a finite set of states S = {ci, <7/v}, 

2. a finite alphabet of symbols A, and 

3. a set of N by N symbol-labeled transition matri- 
ces T^ x \ x £ A, where is the probability of 
transitioning from state o~i to state Oj on symbol x. 
The corresponding overall state-to- state transition 
matrix is denoted T — Y^ x eA T( x > . 

A hidden Markov machine can be depicted as a di- 
rected graph with labeled edges. The nodes are the states 
{(7l, ...,(7jv} and for all x,i,j with T-j > there is an 
edge from state o~i to state o~j labeled p\x for the symbol x 

(x) 

and transition probability p = T^ . We require that the 
transition matrices T^ be such that this graph is strongly 
connected. 

A hidden Markov machine M generates a stationary 
process V = (Xl)l>o as follows. Initially, M starts in 
some state o> chosen according to the stationary distri- 
bution 7r over machine states — the distribution satisfying 
ttT = ir. It then picks an outgoing edge according to their 

(x) 

relative transition probabilities T>» , generates the sym- 
bol x* labeling this edge, and follows the edge to a new 
state <Tj*. The next output symbol and state are conse- 
quently chosen in a similar fashion, and this procedure is 
repeated indefinitely. 

We denote So,Si,S2, ■■■ as the RVs for the sequence 
of machine states visited and X ,Xi,X 2 , ■ ■ ■ as the RVs 
for the associated sequence of output symbols generated. 
The sequence of states (6>l)l>o is a Markov chain with 
transition kernel T. However, the stochastic process we 
consider is not the sequence of states, but rather the as- 
sociated sequence of outputs (Xl)l>o, which generally 



is not Markovian. We assume the observer directly ob- 
serves this sequence of outputs, but does not have direct 
access to the machine's "hidden" internal states. 

C. Examples 

In what follows, it will be helpful to refer to several ex- 
ample hidden Markov machines that illustrate key prop- 
erties and definitions. We introduce four examples, all 
with a binary alphabet A = {0, 1}. 

1. Even Process 

Figure [T] gives a HMM for the Even Process. Its tran- 
sitions matrices are: 

T (o) _ ( P \ 
' \ J 1 

l-p\l 
1 1 1 

FIG. 1: A hidden Markov machine (the e-machine) for the 
Even Process. The transitions denote the probability p of 
generating symbol x as p\x. 

The support for the Even Process consists of all bi- 
nary sequences in which blocks of uninterrupted Is are 
even in length, bounded by 0s. After each even length 
is reached, there is a probability p of breaking the block 
of Is by inserting a 0. The hidden Markov machine has 
two internal states, S = {<7i, a^}, and a single parameter 
p G (0, 1) that controls the transition probabilities. 

2. Alternating Biased Coins Process 

Figure [2] shows a HMM for the Alternating Biased 
Coins (ABC) Process. The transitions matrices are: 



p\l, 1 — p|0 
q\l, l-g|0 

FIG. 2: A hidden Markov machine (the e-machine) for the 
Alternating Biased Coins Process. 

The process generated by this machine can be thought 
of as alternatively flipping two coins of different biases 
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SNS Process 



D. e- Machines 



Figure [3] depicts a two-state HMM for the SNS Process 
which generates long sequences of Is broken by isolated 
Os. Its matrices are: 



T (o) 
T (i) 




1-? 

p 1 -p 

q 



(3) 



l-q\0 

FIG. 3: An HMM for the SNS Process. 

Note that the two transitions leaving state o\ both 
emit x = 1. 



4- Noisy Period-2 Process 

Finally, Fig. [4] depicts a nonminimal HMM for the 
Noisy Period-2 (NP2) Process. The transition matrices 
are: 



T (o) = 



T (i) = 




1-p 







Vi 


-p 





f° 


1 








p 











1 


\p 









(4) 



Hi 



p\l, 1 — p\0 



p\l,l — p\0 



111 



FIG. 4: An HMM for the Noisy Period-2 Process. 

It is clear by inspection that the same process can be 
captured by a hidden Markov machine with fewer states. 
Specifically, the distribution over future sequences from 
states o~ i and 03 are the same, so those two states are 
redundant and can be merged. The same is also true for 
states 02 and 04. 



We now introduce a class of hidden Markov machines 
that has a number of desirable properties for analyzing 
synchronization. 

Definition 5. A finite-state e-machine is a finite-state 
edge-label hidden Markov machine with the following 
properties: 

1. Unifilarity: For each state a k € S and each symbol 
x G A there is at most one outgoing edge from state 
a k labeled with symbol x. 

2. Probabilistically distinct states; For each pair of 
distinct states o~ k ,o~j £ S there exists some finite 
word w = xqX\ . . . xl-i such that: 

Pr(^ L - w\S = a k ) + Pr(^ L = w\S - <Tj). 

The hidden Markov machines given above for the Even 
Process and ABC Process are both e-machines. The SNS 
machine of example 3 is not an e-machine, however, since 
state <Ti is not unifilar. The NP2 machine of example 4 
is also not an e-machine, since it does not have proba- 
bilistically distinct states, as noted before. 

e-Machines were originally defined in Ref. [6 : as hidden 
Markov machines whose states, known as causal states, 
were the equivalence classes of infinite pasts IF with the 
same probability distribution over futures ~at\ This "his- 
tory" e-machine definition is, in fact, equivalent to the 
"generating" e-machine definition presented above in the 
finite-state case. Although, this is not immediately ap- 
parent. Formally, it follows from the synchronization re- 
sults established here and in Ref. [7]. 

It can also be shown that an e-machine M for a given 
process V is unique up to isomorphism [6]. That is, 
there cannot be two different finite-state edge-label hid- 
den Markov machines with unifilar transitions and prob- 
abilistically distinct states that both generate the same 
process V . Furthermore, e-machines are minimal unifilar 
generators in the sense that any other unifilar machine 
M ' generating the same process V as an e-machine M will 
have more states than M. Note that uniqueness does not 
hold if we remove either condition 1 or 2 in Def. [5l 



E. Synchronization 

Assume now that an observer has a correct model M 
(e-machine) for a process V, but is not able to directly 
observe M's hidden internal state. Rather, the observer 
must infer the internal state by observing the output data 
that M generates. 

For a word w of length L generated by M let <f>(w) — 
Pt(S\w) be the observer's belief distribution as to the 
current state of the machine after observing w. That is, 

4>(w) k ^Yi{S L = cr k \i L =w) 

= Pr(S L = a k \^ L =w,S ^w) . 
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And, define: 

u(w) = H[(j)(w)] 

= H[S L \j£ L = w], 

as the observer's uncertainty in the machine state after 
observing w. 

Denote C(M) as the set of all finite words that M can 
generate, Cl(M) as the set of all length- L words it can 
generate, and £ oa (M) as the set of all infinite sequences 
X = xqXi... that it can generate. 

Definition 6. A word w G C(M) is a synchronizing 
word (or sync word) for M if u(w) = 0; that is, if the 
observer knows the current state of the machine with cer- 
tainty after observing w. 

We denote the set of M's infinite synchronizing se- 
quences as SYN(M) and the set of M's infinite weakly 
synchronizing sequences as WSYN(Af): 

SYN(M) = G Coo{M) : u{l? L ) = for some L}, and 
WSYN(M) = G C^M) : u{l? L ) -> as L -> oo} . 

Definition 7. An e-machine M is exactly synchroniz- 
able (or simply exact,) if Pr(SYN(M)) = 1; that is, if 
the observer synchronizes to almost every (a.e.) sequence 
generated by the machine in finite time. 

Definition 8. An e-machine M is asymptotically syn- 
chronizable if Pr(WSYN(M)) = 1; that is, if the 
observer's uncertainty in the machine state vanishes 
asymptotically for a.e. sequence generated by the ma- 
chine. 

The Even Process e-machine, Fig. [T] is an exact ma- 
chine. Any word containing a is a sync word for this 
machine, and almost every x it generates contains at 
least one 0. The ABC Process e-machine, Fig. [2j is not 
exactly synchronizable, but it is asymptotically synchro- 
nizable. 

Remark. If w G C{M) is a sync word, then by unifilar- 
ity so is wv, for all v with wv G C{M). Once an observer 
synchronizes exactly, it remains synchronized exactly for 
all future times. It follows that any exactly synchroniz- 
able machine is also asymptotically synchronizable. 

Remark. If ' w G C{M) is a sync word then so is vw, for 
all v with vw G C{M). Since any finite word w G £(M) 
will be contained in almost every infinite sequence ~£ the 
machine generates, it follows that a machine is exactly 
synchronizable if (and only if) it has some sync word w 
of finite length. 

Remark. It turns out all finite-state e-machines are 
asymptotically synchronizable; see Ref. ^j. Hence, there 
are two disjoint classes to consider: exactly synchro- 
nizable machines and asymptotically synchronizable ma- 
chines that are nonexact. The exact case is the subject 
of the remainder. 



Finally, one last important quantity for synchroniza- 
tion is the observer's average uncertainty in the machine 
state after seeing a length- L block of output [3]. 

Definition 9. The observer's average state uncertainty 
at time L is: 

U{L) = H[S L \1 L ] 

= J2 Pr (^ L ) • H[S L \1 L = . (5) 

That is, U(L) is the expected value of an observer's 
uncertainty in the machine state after observing L sym- 
bols. 

Now, for an e-machine, an observer's prediction of the 
next output symbol is a direct function of the proba- 
bility distribution over machine states induced by the 
previously observed symbols. Specifically, 

Pr(A L = x\l L = -$ L ) 

= ^Pr(x|a fe )-Pr(5 L = ( T fe |^ i -^ i ) . (6) 

Hence, the better an observer knows the machine state 
at the current time, the better it can predict the next 
symbol generated. And, on average, the closer U(L) is 
to 0, the closer h^L) is to h^. Therefore, the rate of 
convergence of h^L) to for an e-machine is closely 
related to the average rate of synchronization. 



III. EXACT SYNCHRONIZATION RESULTS 

This section provides our main results on synchroniza- 
tion rates for exact machines and draws out consequences 
for the convergence rates of U (L) and (L) . 

The following notation will be used throughout: 

• SYNl = {w G Cl(M) : w is a sync word for M}. 

• NSYNi = {w G Cl(M) : w is not a sync word for 
M}. 

• SYN^ CTfc = {w G Cl{M) ■ w synchronizes the ob- 
server to state Ufc}. 

• C(M,o~k) = {w : M can generate w starting in 
state o~k}- 

• For words w,w' G C(M), we say w C w' if there 
exist words u, v (of length > 0) such that w' = uwv. 

• For a word w G C(M, at), 8(ak,w) is defined to be 
the (unique) state in S such that — > &{ph\ w )- 

• For a set of states S c S, we define: 

S(S, w) = {<7j G S : (Tfe Oj for some 07- G S} . 
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A. Exact Machine Synchronization Theorem 

Our first theorem states that an observer synchronizes 
(exactly) to the internal state of any exact e-machine 
exponentially fast. 

Theorem 1. For any exact e-machine M , there are con- 
stants K > and < a < 1 such that: 

Pr(NSYN L ) < Ka L , (7) 

for all LeN. 

Proof. Let M be an exact machine with sync word w G 
£(M, (Tfc). Since the graph of M is strongly connected, 
we know that for each state o~j there is a word Vj such 
that 5{uj 1 Vj) = Ok- Let uij — VjW, n — max., |iOj-|, and 
p = mhij Pr(wj\o~j). Then, for all L > 0, we have: 

Pr(w C ~t n+L \w (£ ~t L ) 

> Py{w C ~]£l\w £ ~t L ) 

> minPr(w; C ~lt n L \S L = cr,-) 

i 

>P- (8) 

Hence, 

Pr(w <£ ~i n+L \w ~i L ) < 1 -p , (9) 
for all L > 0. And, therefore, for all m G N: 

Pr(NSYN„ m ) < Pr(w qL l mn ) 

= Pr(u> £ ~t n ) ■ Pr{W (£ ~t 2n \w <£ jt n ) 
■ ■ ■ Pt(w <£ ~£ mn \w £ A^f" 1 - 1 )™) 

= (l-p) m 

= r , (io) 

where (3 = 1 — p. Or equivalently, for any length L = win 
(m e N): 

Pr(Ar5FA L ) < a L , (11) 

where a = /3 1 /". Since Pr(NSYN^) is monotonically 
decreasing, it follows that: 

Pr(NSYN L ) < — • a L = Ka L , (12) 

for all L e N, where K = l/a n . □ 

Remark. In the above proof we implicitly assumed f3 ^ 
0. If p — 0, then the conclusion follows trivially. 



B. Synchronization Rate 

Theorem [I] states that an observer synchronizes (ex- 
actly) to any exact e-machine exponentially fast. How- 
ever, the sync rate constant: 

a* = lim Pr(NSYN L ) 1/L (13) 

L— >oo 

depends on the machine, and it may often be of practical 
interest to know the value of this constant. We now pro- 
vide a method for computing a* analytically. It is based 
on the construction of an auxiliary machine M. 

Definition 10. Let M be an e-machine with states 
S = {<Ti, crjv}, alphabet A, and transition matrices 
T^ x \x G A. The possibility machine M is defined as 
follows: 

1. The alphabet of M is A. 

2. The states of M are pairs of the form (a, S) where 
a £ S and S is a subset of S that contains a. 

3. The transition probabilities are: 

Pv((a,S)^(a',S')) 

= Pr(x\a)I(x,(a,S),(a , ,S')) , 

where I(x, (cr, S), (a', S')) is the indicator function: 

I(x,(a, S) 7 (a', S')) 

{1 if 5(cr, x) = a' and 6(S, x) = S' 
otherwise. 

A state of M is said to be initial if it is of the form 
(o~,S) for some a G <S. For simplicity we restrict the M 
machine to consist of only those states that are accessible 
from initial states. The other states are irrelevant for the 
analysis below. 

The idea is that M's states represent states of the 
joint (e-machine, observer) system. State a is the true 
e-machine state at the current time, and S is the set of 
states that the observer believes are currently possible for 
the e-machine to be in, after observing all previous sym- 
bols. Initially, all states are possible (to the observer), so 
initial states are those in which the set of possible states 
is the complete set S. 

If the current true e-machine state is cr, and then the 
symbol x is generated, the new true e-machine state must 
be 5(o~, x). Similarly, if the observer believes any of the 
states in S are currently possible, and then the symbol 
x is generated, the new set of possible states to the ob- 
server is S(S,x). This accounts for the transitions in M 
topologically. The probability of generating a given sym- 
bol x from (cr, S) is, of course, governed only by the true 
state a of the e-machine Pr(a;|(<7, S)) = Pr(x|cr). 

An example of this construction for a 3-state exact 
e-machine is given in Appendix [X] Note that the graph 



6 



of the M machine there has a single recurrent strongly 
connected component, which is isomorphic to the original 
machine M. This is not an accident. It will always be 
the case, as long as the original machine M is exact. 



Remark. If M is an exact machine with more than 1 
state the graph of M itself is never strongly connected. 
So, M is not an e-machine or even an HMM in the sense 
of Def. However, we still refer to M as a "machine". 

In what follows, we assume M is exact. We 
denote the states of M as S — {qi, . . . , q^}, its 
symbol- labeled transition matrices r 1 ', and its over- 
all state-to-state transition matrix T — ^xeA^^ ■ 
We assume the states are ordered in such a way 
that the initial states (oi, 5), . . . , {(Jni S) are, re- 
spectively, qi,...,qN- Similarly, the recurrent states 
(ai,{ai}),(a 2 ,{o-2}),...,(<JN,{^N}) are, respectively, 
?n+i> Qn+2i •••) Qmi where n = N — N. The ordering of 
the other states is irrelevant. In this case, the matrix T 
has the following block upper-triangular form: 



T = 



B B' 
O T 



(14) 



where B is a n x n matrix with nonnegative entries, B' 
is a n x N matrix with nonnegative entries, O is a N x n 
matrix of all zeros, and T is the N x N state-to-state 
transition matrix of the original e-machine M. 

Let 7? = (n 1: 7Tjv, 0, 0) denote the length-TV row 
vector whose distribution over the initial states is the 
same as the stationary distribution ir for the e-machine 
M. Then, the initial probability distribution </> over 
states of the joint (e-machine, observer) system is simply: 



(15) 



and, thus, the distribution over states of the joint system 
after the first L symbols is: 



= nT L 



(16) 



If the joint system is in a recurrent state of the form 
(ofc, {ct/c}), then to the observer the only possible state 
of the e-machine is the true state, so the observer is syn- 
chronized. For all other states of M, the observer is not 
yet synchronized. Hence, the probability the observer is 
not synchronized after L symbols is simply the combined 
probability of all nonreccurent states qi in the distribu- 



tion <f>L. Specifically, we have: 

n 

Pr(NSYN L ) = ^(0 L ), 

2=1 

n 



1=1 



= h B B% 



(17) 



where ir B = (tti, tvn, 0, 0) is the length-n row vec- 
tor corresponding to the distribution over initial states 
7T. The third equality follows from the block upper- 
triangular form of T . 
Appendix |B| shows that: 



lim (W^B^f^r, 



(18) 



where r = r(B) is the (left) spectral radius of B: 

r(B) — max{|A| : A is a (left) eigenvalue of B}. (19) 
Thus, we have established the following result. 
Theorem 2. For any exact e-machine M, a* = r. 

C. Consequences 

We now apply Thm. [T] to show that an observer's av- 
erage uncertainty U (L) in the machine state and average 
uncertainty h^(L) in predictions of future symbols both 
decay exponentially fast to their respective limits: and 
hn- The decay constant a in both cases is essentially 
bounded by the sync rate constant a* from Thm. [2j 

Proposition 1. For any exact e-machine M , there are 
constants K > and < a < 1 such that: 



U(L) < Ka L , for all LeN. 



(20) 



E 



Proof. Let M be any exact machine. By Thm. [TJthere are 
constants C > and < a < 1 such that Pr(NSYN L ) < 
Ca L , for all LeN. Thus, we have: 

U{L) = P*(w)u(w) 

wdC L (M) 

Pr(w)u(w) + Pt(w)u(w) 

w£NSYN L 

< o + Y Pr H l °s( N ) 

tuGNSYNz. 

< log(JV) • Ca L 

= Ka L , (21) 

where N is the number of machine states and K = 
Clog(TV). 

□ 
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Let h k = H[X \S Q = cr fe ] and h w = H[X \S ~ <f>(w)], 
be the conditional entropies in the next symbol given the 
state Cfc and word w. 

Proposition 2. For any exact e-machine M : 

h» = H[X \S Q ] =J2^kh k (22) 

k 

and there are constants K > and < a < 1 such that: 
h^L) - < Ka L , for all L G N. (23) 



and 



Remark. TTie hn formula Eq. (22) has been known for 



some time, although in slightly different contexts. Shan- 
non, for example, derived this formula in his original 
publication for a type of hidden Markov machine that 
is similar (apparently unifilar) to an e-machine. 

Proof. Let M be any exact machine. Since we know 
hfj,(L) \ h^ it suffices to show there are constants K > 
and < a < 1 such that: 



(24) 



Thus, 



and 



(25) 
(26) 



for all L e N. This will establish both the value of h^ 
and the necessary convergence. 

Now, by Thm. [T] there are constants C > and < 
a < 1 such that Pr(NSYN L ) < Ca L , for all LeN. Also, 
note that for all L and k we have: 



TTfe = ^2 Pr(w) ■ <j)(w) k 
wec L (M) 

u)GSYNi lCTJ , 

= Pr(SYN ijfffc ) . 
^ (7r fc - Pr(SYN £ , CT J) • > 

fe 

^(7T fe - Pr(SYN i>(T J) • h k 

k 

<^(7r fc -Pr(SYN L ^)).logL4| 

= log - (5>*-E Pr ( SYN ^») 

\ k k J 

= log \A\ ■ (l-Pr(SYNi)) 
= log |^| • Pr(NSYN L ) 

< log \A\ -Ca L . 
Also, clearly, 

53 Pr(iy) • h w > 

ioGNSYNl 



(27) 
(28) 



53 Pr(w) ■ ^ < log |-4| • Pr(NSYN L ) 

< log L4| • Ca L . (29) 



toGNSYN 



Therefore, we have for all LeN: 



h^(L + 1) - yVfc/tA 



Pr(t«)/i Ji , - 53 Kkhk 
we£ L (M) k 



53 Pr(u/)/i w + 53 Pr(w)fe tu - y^7r fc /j. fc 



ujSNSYNx 



53 Pr(u>)fc„ + 5^Pr(SYN i ,« rfc )fc fc -J2*kh 

tueNSYNz, fc fc 



53 PrH/i^ - 53(7r fc - PrCSYNi,^))^ 

tuENSYNi fe 

< Clog |^| a L . 



(30) 



The last inequality follows from Eqs. (26 1- (29 1, since 



\x — y\ < z for all nonnegative real numbers x, y, and z 
with x < z and y < z. 
Finally, since: 



h^L + 1) - y^^ k h k 



for all L £ N, we know that: 



fe 



< Clog|„4|a z 



< Ka L 



(31) 



(32) 



for all L e N, where X = (log |-4|/a) • max{C, 1}. □ 

Remark. For any a > a* there exists some K > /or 
which Eq. holds. Hence, by the constructive proofs 
above, we see that the constant a in Props. [7] and[j| can 
be chosen arbitrarily close to a* : a — a* + e. 



IV. CHARACTERIZATION OF EXACT 
e-MACHINES 

In this section we provide a set of necessary and suffi- 
cient conditions for exactness and an algorithmic test for 
exactness based upon these conditions. 



A. Exact Machine Characterization Theorem 

Definition 11. States a k and o~j are said to be topolog 
ically distinct if C{M,a k ) ^ C(M,(Tj). 



Definition 12. States a k and tjj are said to be path 
convergent if there exists w € £(M, a k ) H C(M, aj) such 
that 5(a k , w) — 5(<Tj,w). 

If states a k and cr^ are topologically distinct (or path 
convergent) we will also say the pair (a k , aj) is topologi- 
cally distinct (or path convergent). 

Theorem 3. An e-machine M is exact if and only if 
every pair of distinct states (a k , a A satisfies at least one 
of the following two conditions: 

(i) The pair (a k ,aj) is topologically distinct. 

(ii) The pair (a k ,aj) is path convergent. 

Proof. It was noted above that an e-machine M is exact 
if and only if it has some sync word w of finite length. 
Therefore, it is enough to show that every pair of distinct 
states (ak,o~j) satisfies either (i) or (ii) if and only if M 
has some sync word w of finite length. 

We establish the "if" first: If M has a sync word w, 
then every pair of distinct states (a k ,aj) satisfies either 
(i) or (ii). 

Let w be a sync word for M. Then w € £(M,<Jk) 
for some k. Take words Vj, j — 1,2, ...N, such that 
8{aj 1 Vj) = a k . Then, the word VjW = Wj € £(M,<jj) 
is also a sync word for M for each j. Therefore, for 
each i j either Wj $ £(M, aC) or 5(<7i, Wj) = 6(<Tj, Wj). 
This establishes that the pair (<Tj,<Tj) is either topolog- 
ically distinct or path convergent. Since this holds for 
all j = 1, 2, N and for all i ^ j, we know every pair 
of distinct states is either topologically distinct or path 
convergent. 

Now, for the "only if" case: If every pair of distinct 
states (a k ,aj) satisfies either (i) or (ii), then M has a 
sync word w. 

If each pair of distinct states (a k , aj) satisfies either (i) 
or (ii), then for all k and j (k ^ j) there is some word 
w<r k ,a- such that one of the following three conditions is 
satisfied: 

1- w ak ^ 3 € £(M,a k ), but w„ lt>aj £{M,a 3 ). 

2- Wa k ^ 3 G £{M,a.j), but w„ hltTj £(M, a k ). 

3- w rJk ^ j € £(M,a k ) n C(M,aj) and S(a k ,w akyr7j ) = 
5{a J1 w ak . <Jj ). 

We construct a sync word w — WiW2...w m for M, 
where each Wi — w ak .^ aj , for some ki and ji, as follows. 

• Let 5° = {a° 1: ...,a° No } = S = {a 1: a N }. Take 

• Let S 1 = {a\, . . . , a l Nj } = S(S , w-C). Since w\ = 
WpO pO satisfies either condition (1), (2), or (3), we 
know Ni < N Q . Take u>2 = w a i iCr i. 

• Let S 2 = {af, . . . , aff } = S(S 1 , w 2 ). Since u>2 — 
w a i >a i satisfies either condition (1), (2), or (3) we 
know N2 < N%. Take W3 = w a 2 a 2. 



Repeat until = 1 for some m. Note that this 

must happen after a finite number of steps since N = No 
is finite and N > N\ > N 2 > ■ ■ ■ . 

By this construction w — w\W2---W m € £{M) is a sync 
word for M . After observing w, an observer knows the 
machine must be in state a™ ■ □ 

B. A Test for Exactness 

We can now provide an algorithmic test for exactness 
using the characterization theorem of exact machines. 
We begin with subalgorithms to test for topological dis- 
tinctness and path convergence of state pairs. Both are 
essentially the same algorithm and only a slight modifi- 
cation of the deterministic finite-automata (DFA) table- 
filling algorithm to test for pairs of equivalent states [10] . 

Algorithm 1. Test States for Topological Distinctness. 

1. Initialization: Create a table containing boxes for 
all pairs of distinct states (a k ,aj). Initially, all 
boxes are blank. Then, 

Loop over distinct pairs (a k ,Uj) 
Loop over x G A 

If {x G £(M, a k ) but x £ £(M, aj)} 

or {x £ £(M, a ) but x <£ £(M, a k )}, 
then mark box for pair (a k , aj). 
end 

end 

2. Induction: If 6(a kl x) = ay, S(aj,x) = aj', and 
the box for pair (a k ',aji) is already marked, then 
mark the box for pair (a k , aj). Repeat until no more 
inductions are possible. 

Algorithm 2. Test States for Path Convergence. 

This algorithm is identical to Algorithm 1 except that 
the if-statement in the initialization step is replaced with 
the following: 

If x € £(M,a k ) H £{M,aj) and S(a k ,x) = 
5(aj, x), then mark box for pair (a k , aj). 

With Algorithm [T] all pairs of topologically distinct 
states end up with marked boxes. With Algorithm [2] 
all pairs of path convergent states end up with marked 
boxes. These facts can be proved, respectively, by using 
induction on the length of the minimal distinguishing or 
path converging word w for a given pair of states. The 
proofs are virtually identical to the proof of the stan- 
dard DFA table filling algorithm, so the details have been 
omitted. 
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Note also that both of these are polynomial-time al- 
gorithms. Step (1) has run time <3(|„4|./V 2 ). The induc- 
tions in Step (2), if done in a reasonably efficient fash- 
ion, can also be completed in run time 0(|./4|iV 2 ). (See, 
e.g., the analysis of DFA table filling algorithm in Ref. 
[TO].) Therefore, the total run time of these algorithm is 
0(\A\N 2 ). 

Algorithm 3. Test for Exactness. 

1. Use Algorithm 1 to find all pairs of topologically 
distinct states. 

2. Use Algorithm 2 to find all pairs of path convergent 
states. 

3. Loop over all pairs of distinct states {ak,o~j) to 
check if they are either (i) topologically distinct or 
(ii) path convergent. By Thm. [$| if all distinct 
pairs of states satisfy (i) or (ii) or both, the ma- 
chine is exact, and otherwise it is not. 

This, too, is a polynomial-time algorithm. Steps (1) 
and (2) have run time 0(|-4| N 2 ). Step (3) has run time 
0(N 2 ). Hence, the total run time for this algorithm is 
0(\A\N 2 ). 

V. CONCLUSION 



Appendix A 

We construct the possibility machine M for the 
three-state e-machine shown in Fig. [5j The result is 
shown in Fig. [6] 



Pn\b 




P23\b 

FIG. 5: A three-state e-machine M with alphabet A = 
{a,b,c}. 



We have analyzed the process of exact synchroniza- 
tion to finite-state e-machines. In particular, we showed 
that for exact machines an observer synchronizes expo- 
nentially fast. As a result, the average uncertainty h^(L) 
in an observer's predictions converges exponentially fast 
to the machine's entropy rate — a phenomenon first 
reported for subshifts estimated from maps of the inter- 
val [11 J . Additionally, we found an efficient (polynomial- 
time) algorithm to test e-machines for exactness. 

In Ref. [7] we similarly analyze asymptotic synchro- 
nization to nonexact e-machines. It turns out that quali- 
tatively similar results hold. That is, U(L) and h^(L) 
both converge to their respective limits exponentially 
fast. However, the proof methods in the nonexact case 
are substantially different. 

In the future we plan to extend these results to more 
generalized model classes, such as to e-machines with 
a countable number of states and to nonunifilar hidden 
Markov machines. 
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P33\a 




P23\b 



FIG. 6: The possibility machine M for the three-state 
e-machine M of Fig. [5] The state names have been abbrevi- 
ated for display purposes: e.g., (oi, {cri, 02, °"3j) — *• (1,123). 
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Appendix B 



We prove Eq. (181 in Sec. IIIB (Restated here as 
Lemma [TJ) 



Lemma 1. For any exact e-machine M , 
lim \\ti b B l \\\ /L =r(B) . 



(Bl) 



In what follows A denotes an arbitrary m x m matrix 
and it and 1$ denote row m- vectors. Unless otherwise 
specified, the entries of matrices and vectors are assumed 
to be complex. 

Definition 13. The (left) matrix p-norms (1 < p < oo) 
are defined as: 

\\A\\ p = max{||1^4|| p : - 1} . (B2) 

The following facts will be used in our proof. 

Fact 1. If A is a matrix with real nonnegative entries 
and it = (vx, . . . ,V m ) is a vector with real nonnegative 
entries, then: 



\ltA\\ 1 = J2\Kvk^k)A\\ 



(B3) 



k=l 



where e k — (0, . . . , 1, . . . , 0) is the standard basis 
vector. 

Fact 2. Let A be a matrix with real nonnegative entries, 
let it = («!,..., v m ) be a vector with complex entries, 



and let w — (wi, . . . , w m ) = (|i>i|, . . . , |u m |). The 
\ltA\U < ||t^A||i 



(B4) 



Fact 3. For any matrix A = {a^}, the matrix 1-norm 
is the largest absolute row sum: 



\A\\i =max^ 

B £. 4 



(B5) 



Fact 4. For any matrix A, L € N, and 1 < p < oo: 

\\A L \\ P <\\A\\ L . (B6) 
Fact 5. For any matrix A and 1 < p < oo: 



lim \\A L \\l' L = r(A) , 



(B7) 



where r(A) is the (left) spectral radius of A: 

r{A) = max{|A| : A is a (left) eigenvalue of A}. (B8) 

(This is, of course, the same as the right spectral ra- 
dius, but we emphasize the left eigenvalues for the proof 
of Lemma^ below.) 



Fact[T] can be proved by direct computation, and Fact 
[5] follows from the triangle inequality. Fact |3] is a stan- 
dard result from linear algebra. Facts [4] and [5] are finite- 
dimensional versions of more general results established 
in Ref. [T^] for bounded linear operators on Banach 
spaces. 

Using these facts we now prove Lemma [T] 
Proof. By Fact [5] we know: 

lim sup \\ir B B L \\{ /L < r(B) . (B9) 

Thus, it suffices to show that: 

liminf \\tt b B l \\\ ,L > r{B) . (BIO) 

L— >oo 

Let^us define the B-machine to be the restriction of 
the M machine to its nonreccurent states. The state-to- 
state transition matrix for this machine is B. We call 
the states of this machine B-states and refer to paths in 
the associated graph as B-paths. Note that the rows of 
B = {bij} are substochastic: 



< i 



(Bll) 



for all i, with strict inequality for at least one value of i 
as long as M has more than 1 state. 

By the construction of the B-machine we know that 
for each of its states qj there exists some initial state 
Qi = Qi(j) such that qj is accessible from qitj\- Define 
lj to be the length of the shortest B-path from q^ to 
qj , and l max — max^ lj . Let Cj > be the probability, 
according to the initial distribution ir B , of both starting 
in state q^us at time and ending in state qj at time lj\ 

Cj = (ir i( j)~£ iU) B l i) j . 

Finally, let C\ = min^ Cj . 

Then, for any L > l max and any state qj we have: 



W B B 



L "i > \\ir i ( j )~e > i (j)B L \\ 



> \\cfZjB L - 1 * 

> dW^jB 1 "- 1 



l 

l 



L — L 



(B12) 
(B13) 
(B14) 
(B15) 
(B16) 



Equation (B12) follows from Fact[T] The decomposition 
in Eq. (B13) is possible since L > l max > lj. Equation 
(B14) follows from Fact [T] and the definition of Cj. Equa- 
tion (B15) follows from the definition of C\. Finally, Eq. 
(B16) follows from Fact[3j Fact[IJ and Eq. (Bll). 

Now, take a normalized (left) eigenvector ~tf = 
(yi, y n ) of B whose associated eigenvalue is maximal. 
That is, Hl/Hi = 1, ~fB = Xf, and |A| = r(B). Define 
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Z = (zx, z n ) — (|yi|, \y n \)- Then, for any LeN: where C2 > is defined by: 

n 

= \\-tB% (B17) C 2 = min 



1 



k=l 



>II^S L ||l 

= l|A^||i 
= r(B) L , 



(B18) 
(B19) 
(B20) 
(B21) 



where Eq. (B17) follows from Fact[T]and Eq. (B18) from 
Fact [2] Therefore, for each L we know there exists some 
j = j(L) in {!,..., n} such that: 



^■(i)ll^(i) sL Hi > 



r(B) 1 



(B22) 



Now, r(B) may be 0, but we can still choose the j(L)'s 
such that Zj(L) is never zero. And, in this case, we may 
divide through by on both sides of Eq. (B22) to 

obtain, for each L: 



B 



> 



r(B) 1 



n ■ z m 
>C 2 -r(B) L , 



(B23) 



zjjto n ■ Zj 

Therefore, for any L > l max we know: 

\\K B B L \\ 1 >C 1 .\\-t m B L \\ 1 
> C x ■ (C 2 ■ r(B) L ) 
= C 3 • r(B) L , 



(B24) 
(B25) 
(B26) 



where C3 = C1C2. Equation (B24) follows from Eq. 
(B16) and Eq. (B25) follows from Eq. (B23). Finally, 
since this holds for all L > l max , we have: 



liminf ||Tr jB ,B i ||i /jL > liminf (C 3 ■ r(B) L ) 1/L 

= r(B) . (B27) 



L — ^00 



□ 
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